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MULTISCALE INFERENCE ABOUT A DENSITY 

By Lutz Dumbgen 1 and Gunther Walther 2 

University of Bern and Stanford University 

We introduce a multiscale test statistic based on local order 
statistics and spacings that provides simultaneous confidence state- 
ments for the existence and location of local increases and decreases 
of a density or a failure rate. The procedure provides guaranteed 
finite-sample significance levels, is easy to implement and possesses 
certain asymptotic optimality and adaptivity properties. 

1. Introduction. An important aspect in the analysis of univariate data 
is inference about qualitative characteristics of their distribution function 
F or density /, such as the number and location of monotone or convex 
regions, local extrema or inflection points. This issue has been addressed 
in the literature using a variety of methods. Silverman (1981), Mammen, 
Marron and Fisher (1992), Minnotte and Scott (1993), Fisher, Mammen 
and Marron (1994), Minnotte (1997), Cheng and Hall (1999) and Chaud- 
huri and Marron (1999, 2000) use kernel density estimates. Excess masses 
and related ideas are employed by Hartigan and Hartigan (1985), Hartigan 
(1987), Midler and Sawitzky (1991), Polonik (1995) and Cheng and Hall 
(1998). Good and Gaskins (1980) and Walther (2001) use maximum like- 
lihood methods, whereas Davies and Kovac (2004) employ the taut string 
method. In the present paper, a qualitative analysis of a density / means 
simultaneous confidence statements about regions of increase and decrease, 
as well as local extrema. Such simultaneous inference has been only spar- 
ingly treated in the literature. Also, the methods available thus far provide 
only approximate significance levels as the sample size tends to infinity, and 
they rely on certain regularity conditions on /. 
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In this paper, we introduce and analyze a procedure that provides simul- 
taneous confidence statements with guaranteed given significance level for 
arbitrary sample size. The approach is similar to that of Diimbgen (2002), 
who used local rank tests in the context of nonparametric regression, or 
Chaudhuri and Marron's (1999, 2000) SiZer, where kernel estimators with 
a broad range of bandwidths are combined. Here, we utilize test statistics 
based on local order statistics and spacings. The use of spacings for nonpara- 
metric inference about densities has a long history. For instance, Pyke (1995) 
describes various goodness-of-fit tests based on spacings and Roeder (1992) 
uses such tests for inference about normal mixtures. Confidence bands for 
an antitonic density on [0, oo) via uniform order statistics and spacings have 
been constructed by Hengartner and Stark (1995) and Diimbgen (1998). 

In Section 2, we define local spacings and related test statistics which 
indicate isotonic or antitonic trends of / on certain intervals. Then, a de- 
terministic inequality (Proposition 2.1) relates the joint distribution of all 
these test statistics in general to the distribution in the special case of a 
uniform density. This enables us to define a multiple test concerning mono- 
tonicity properties of /. Roughly speaking, we consider all intervals whose 
endpoints are observations. The rationale for using and combining statis- 
tics corresponding to such a large collection of (random) intervals is that 
the power for detecting an increase or decrease of / is maximized when 
the tested interval is close to an interval on which / has such a trend. In 
that context, we also discuss two important differences with Chaudhuri and 
Marron's SiZer map. 

In Section 3, we describe a particular way of calibrating and combining 
the single test statistics. Optimality results in Section 4 show that in many 
relevant situations, the resulting multiscale test is asymptotically as power- 
ful, in the minimax sense, as any procedure can essentially be for detecting 
increases and decreases of / on small intervals as well as on large intervals. 
Thus, neither the guaranteed confidence level nor the simultaneous consider- 
ation of many intervals results in a substantial loss of power. In addition, we 
prove that our procedure is able to detect and localize an arbitrary number 
of local extrema under weak assumptions on the strength of these effects. 

In Section 5, we consider a density / on (0, oo) and modify our mul- 
tiple test in order to analyze monotonicity properties of the failure rate 
//(l — F). It is well known that spacings are a useful object in this context; 
see, for example, Proschan and Pyke (1967), Bickel and Doksum (1969) and 
Barlow and Doksum (1972). While these authors use global test statistics, 
Gijbels and Heckman (2004) localize, standardize and combine such tests, 
albeit without calibrating the various scales. Hall and van Keilegom (2005) 
use resampling from an appropriately calibrated null distribution in order 
to achieve better sensitivity in detecting local effects, which leads to an 
asymptotically valid test procedure without explicit information about the 
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location of these effects. Walther (2001) uses a multiscale maximum likeli- 
hood analysis to detect local effects. 

Section 6 illustrates the multiscale procedures with two examples and in- 
troduces a graphical display. Proofs and technical arguments are deferred to 
Section 7. One essential ingredient is an auxiliary result concerning stochas- 
tic processes with sub-Gaussian marginals and sub exponential increments. 
This result generalizes Theorem 6.1 of Diimbgen and Spokoiny (2001) and 
is a corollary to more general results in a technical report by Diimbgen and 
Walther (2006). 

To establish notation for the sequel, suppose that Y±,Y2, . . . ,Y m are in- 
dependent random variables with unknown distribution function F and 
(Lebesgue) density / on the real line. In order to infer properties of / from 
these data, we consider the corresponding order statistics Y^ < Y"( 2 ) < ■ ■ • < 
Y( m y In some applications, F is known to be supported by an interval [a, oo), 
(—00,6] or [a,b], where —00 < a < b < 00. In that case, we add the point 
Y(o) := a, the point Y( m +i) := b or both respectively, to our ordered sam- 
ple. This yields a data vector X = (X^)™=q with real components Xnf\ < 
Xm < ■ • ■ < Xr n+ u, where n £ {to — 2, to — I, in}. For 0<j<k<n + l 
with k — j > 1, the conditional joint distribution of , . . . , Xtk-i) , given 

X(j\ and X(u\, coincides with the joint distribution of the order statistics of 
k — j — 1 independent random variables with density 

f M . = Hxei jk }f(x) 

mX) - F(X (k) ) - F(X {j) Y 
where Xj\~ stands for the interval 

Xj k := (X(j),X( k )). 

Thus, (Xy + ,j)) i= o is useful for inferring properties of / on Xjk- The multiple 
tests which follow are based on all such tuples. 



2. Local spacings and monotonicity properties of the density. Let us 

consider one particular interval Xjk and condition on its endpoints. In order 
to test whether / is nonincreasing or nondecreasing on Xjk, we introduce 
the local order statistics 



and the test statistic 



X (k) ~ X (j) 



i=j+l 
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where 

0{x) :=l{xe(0,l)}(2a;-l). 

This particular test statistic T^(X) appears as a locally most powerful test 
statistic for the null hypothesis "A < 0" versus "A > 0" in the parametric 
model, where 

X {k)-X(j)\ ^ X (k)~ x (j) 2 

Elementary algebra yields the following alternative representation of our 
single test statistics: 

(2.1) Tjk (X) = -(k-j) £ P( % ~l ~y 2 ) {X{i;i,k) - 



i=j+i 



k-j 



Thus, Tjfc(X) is a weighted average of the local spacings X^.j^ — Xr^.j^, 
j < i< k. 

Suppose that / is constant on Ij k . The random variable Tj k (X.) is then 
distributed (conditionally) as 

k-j -i 

(2.2) £ P(Ui), 

i=i 

with independent random variables Ui having uniform distribution on [0, 1]. 
Note that the latter random variable has mean zero and variance (k — j — 
l)/3. However, if / is nondecreasing or nonincreasing on Ijk, then 2j^(X) 
tends to be positive or negative, respectively. The following proposition pro- 
vides a more general statement, which is the key to our multiple test. 

PROPOSITION 2.1. Define U = (U^))^, with components Ur^ := F (Xr i \), 
where F Q is the distribution function corresponding to the density fo, n +i- 
Then, Uns, . . . , Ur n ^ are distributed as the order statistics of n independent 
random variables having uniform distribution on [0,1], while £7(o) =0 and 
E^i+i) = 1- Moreover, for arbitrary integers < j < k <n + l with k — j > 1, 

> Tjfc(U), if f is nondecreasing onljk, 
< Tjfc(U), iff is nonincreasing onljk- 



This proposition suggests the following multiple test. Suppose that for a 
given level a G (0, 1), we know constants Cjk(a) such that 

(2.3) ¥{\T jk (U)\ < c jk (a) for all < j < k < n+ 1, k - j > 1} > 1 - a. 

Let 

V ± {a) := {l jk : ±T jk (X) > c jk (a)}. 
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One can then claim with confidence 1 — a that / must have an increase on 
every interval in T >+ (a) and that it must have a decrease on every interval in 
T)~ (a). In other words, with confidence 1 — a, we may claim that for every 
X G P ± (a) and for every version of /, there exist points x, y G X with x < y 
and ±(f(y) - f(x)) > 0. 

Combining the two families P =t (a) properly allows the detection and lo- 
calization of extrema as well. Suppose, for instance, that there exist in- 
tervals Ii, l2,...,I m in V + {a) and D±, D2, . . . ,D m in V~(a) such that 
I\ < D\ < I2 < D2 < • • • < I m < -Dm, where the inequalities are to be under- 
stood elementwise. Under the weak assumption that / is continuous, one 
can conclude with confidence 1 — a that / has at least m different local 
maxima and m — 1 different local minima. 

Note that our multiscale test allows the combination of test statistics 
Tjfc(X) with arbitrary "scales" k—j. This is an advantage over Chaudhuri 
and Marron's (1999, 2000) SiZer map, where statements about multiple in- 
creases and decreases are available only at a common bandwidth. This is due 
to the fact that these authors use kernels with unbounded support and rely 
on a particular variation-reducing property of the Gaussian kernel which 
holds only for an arbitrary but global bandwidth. Another consequence of 
the kernel's unbounded support is that localizing trends of / itself is not 
possible. 

3. Properly combining the single test statistics. It remains to define 
constants Cjk(a) satisfying (2.3). First, note that T^U) has mean zero and 
standard deviation \J [k — j — l)/3. Motivated by recent results of Diimbgen 
and Spokoiny (2001) concerning multiscale testing in Gaussian white noise 
models, we consider the test statistic 

T n (X):= max (J |X,fc(X)| - rf \ 

0<j<k<n+l:k-j>l\\] k-j-V 71 yn+l))' 

where T(5) := (21og(e/5)) 1//2 . This particular additive calibration for vari- 
ous scales is necessary for the optimality results to follow. Without the term 
T((k — j)/(n + 1)), the null distribution would be dominated by small scales, 
as there are many more local test statistics on small scales than on large 
scales, with a corresponding loss of power at large scales. The next theorem 
states that our particular test statistic T n (U) converges in distribution. Un- 
less stated otherwise, asymptotic statements in this paper refer to n — > 00. 

Theorem 3.1. 

T n (U)^ c T(W):= sup 0l^M-r(v-u)\ 
o<u<v<i \ yv - u J 



G 
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where 



Z(u,v) :=?, 1 / 2 ["pi — Vlf(x) 

Ju \V-Uj 



and W is a standard Brownian motion on [0, 1] . Moreover, < T < oo almost 
surely. 

Consequently, if K n (a) denotes the (1 — a)-quantile of £(T n (U)), then 
Kn(oi) = 0(1) and the constants 



c ^ (a):= vHr J K^) + ^ w 

satisfy requirement (2.3). For explicit applications, we do not use the limiting 
distribution in Theorem 3.1, but rely on Monte Carlo simulations of T„(U) 
which are easily implemented. 

4. Power considerations. Throughout this section, we focus on the de- 
tection of increases of / by means of T> + (a). Analogous results hold true for 
decreases of / and V~{a). 

For any bounded open interval /Cl, we quantify the isotonicity of / on 
/by 

inf/': inf f{v) - f{x) 

I x,y£l:x<y y — x 

= inf / (x) if / is differentiable on I. 

We now analyze the difficulty of detecting intervals / with inf/ /' > 0. An 
appropriate measure of this difficulty turns out to be 



H(f,I) :=inf/'.|J|7/F(I), 

where |/| denotes the length of /. Note that this quantity is affine equiv- 
ariant, in the sense that it does not change when / and / are replaced by 
<t _1 /((T _1 (- — fj,)) and {fi + ax : x £ I}, respectively, with /i S R, a > 0. For 
given numbers <5 € (0, 1] and r\ £ R, we define 

F(I,5,r l ):={f:F(I) = 6,H(fJ)>r l } 

and 

mv)--= u him. 

bounded intervals I 

Note that f(x) > inf/ f ■ (x - inf (I)) on J, so F(I) > inf/ /' • \I\ 2 /2. Hence, 



(4-1) H(f,I)<2y/F(I). 

Thus, J- (1, 5, rf) and J- (5, rf) are nonvoid if and only if r\ < 2\J~8. 
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Theorem 4.1. Let 5 n G (0, 1] and < c n < a/24 < C n . 

(a) Let I n be a bounded interval and f n a density in F(I n ,8 n ,C n x 
y/\og{e/5 n )/n). Then, 

P f n {T> + (a) contains an interval J C I n ) — ► 1, 

provided that (C n — a/24) \/log(e/<5 n ) — ► oo . 

(b) Let n (X) 6e any test with level a £ (0, 1) under the null hypothesis 
that X is drawn from a nonincreasing density. If (log n) 2 /n < S n — ► 0, i/ien 

inf %^n(X) < a + o(l), 

/e^ r (5 n ,c nA /log(e/<5 n )/n) 

provided that (a/24 — c n )\/log(e/<5 n ) — > oo. 

(c) Lei J n be any interval and b n some number in [0, 2y/n5 n ]. If n (X) 
is any iest rat/i /eve/ a € (0, 1) under the null hypothesis that the density is 
nonincreasing on I n , then 

inf E /( ^(X)^1 

n fin 

/v^) 

implies that b n — > oo and n5 n — > oo. 

Theorem 4.1 establishes that our multiscale statistic is optimal, in the 
asymptotic minimax sense, for detecting an increase on an unknown interval, 
both in the case of an increase occurring on a small scale (<5 n \ 0) and in 
the case of an increase occurring on a large scale (liminf 5 n > 0). 

In the case of small scales, a comparison of (a) and (b) shows that there 
is a cut-off for the quantity H(f,I) at \/241og(e/<5 n )/n: if one replaces the 
factor 24 with 24 + e n , with e n \ sufficiently slowly, then the multiscale 
test will detect and localize such an increase with asymptotic power one, 
whereas in the case 24 — e n , no procedure can detect such an increase with 
nontrivial asymptotic power. 

In the case of large scales, one may replace J r ( I n , 6 n , C n \/log (e /5 n )/n) in 
(a) with the family T(I n ,5 n ,C n / \/n), where C n — ► oo. Then, a comparison of 
(a) and (c) again shows our multiscale test to be optimal, even in comparison 
to tests using a priori knowledge of the location and scale of the potential 
increase. Hence, searching over all (large and small) scales does not involve 
a serious drawback. In the case of small scales, (a) and (c) together show 
that ignoring prior information about the location of the potential increase 
leads to a penalty factor of order o(y / log(e / 5 n )) = o(^logn). 

Example 1. Let us first illustrate the theorem in the special case of a 
fixed continuous density / and a sequence of intervals I n converging to a 
given point x Q , where we use the abbreviation 



p n := log(n)/n. 
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Example 1a. Let / be continuously differentiable in a neighborhood of 
x Q , such that f(x ) > and f'(x ) > 0. If \I n \ = D n pl/ 3 with D n — > D > 0, 
then S n := F(I n ) is equal to D n f(x )pl/ 3 (1 + o(l)) and inf/ n /' = /'(x G ) + 
o(l). Hence, the quantity H{f,I n ) may be written as Dn 2 f (x Q ) f (x )~ 1 / 2 x 

Pn /2 (1 + o(l)), while A /241og(e/<5„)/n = 8V 2 pi /2 + o(l). Consequently, the 
conclusion of Theorem 4.1(a) is correct if 

D n \(8f(x )/f'(x ) 2 f 3 

sufficiently slowly. 

Example 1b. Let / be differentiable on (x ,oo), with f(x ) = and 
f'(x + h) = r yh K " 1 (l + o(l)) as h \ 0, where 7, k > 0. Defining the interval 

I n to be [x + CipH^ K+l \x + C2/0n^' v+1 ^ with < C\ < C2, the conclusion 
of Theorem 4.1(a) is correct, provided that min(Cj t_1 , C2 _1 ) and C2/C1 are 
sufficiently large. 

Example lc. Let / be twice continuously differentiable in a neighbor- 
hood of x , such that f(x a ) > 0, f'(x ) = and ±f"(x ) / 0. Now, take the 

two intervals ijp := [x a - Cip\ ,x - C\pl/ 5 ] and In^ := [x Q + Cipl/ 5 ,x + 
1/5 

CiPn ], with < C\ < C2. If Ci and CijC\ are sufficiently large, then it 
follows from Theorem 4.1(a) and its extension to locally decreasing densities 
that 

P(P =t contains some J C I}p and T> T contains some J C ) — > 1. 

Thus, our multiscale procedure will detect the presence of the mode with 
asymptotic probability one and furthermore localize it with precision 
O p ((log(n)/n) 1 / 5 ). Up to the logarithmic factor, this is the optimal rate 
for estimating the mode [cf. Hasminskii (1979)]. 

Example 2. Now, let I be a fixed bounded interval and consider a 
sequence of densities f n such that sup xg j \f n {x) — f \ — ► for some constant 
fo > 0. Here, the conclusion of Theorem 4.1(a) is correct, provided that 

Vn-inif^oo. 

The next theorem concerns the simultaneous detection of several increases 
of/. 

Theorem 4.2. Let f = f n and let I n be a collection of nonoverlapping 
bounded intervals such that for each I £l n , 

(4.2) H(f n , I) > C{sJ\og{e/F n {I)) + b n )/^ 
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with constants < b n — > oo and C > y/24. Then, 

Ff n (for each I G Z n ,2? + (a) contains an interval J C /) — ► 1 
in eac/i o/ £/te following three settings, where 5 n := min/gj^ F n {I): 

(i) C>34; 

(ii) C > 2\/24 and n6 n /log(e#I n ) -> oo; 

(iii) C = ^24 and n£ n /log(e#J n ) -» oo, log#X„ = o(6 2 ). 

It will be shown in Section 7 that (4.2) entails n5 n > (C 2 /4 + o(l)) log n. In 
particular, #X n = o(n). Moreover, Theorem 4.1(a) follows from Theorem 4.2 
by considering setting (iii) with X n consisting of a single interval I n . 

A comparison with Theorem 4. 1 (a) shows that the price for the simultane- 
ous detection of an increasing number of increases or decreases is essentially 
a potential increase of the constant v24. 

The proof of Theorem 4.2 rests on an inequality involving the following 
auxiliary functions. For c G [—2, 2] and u G [0, 1], let 

g c {u) := I + c(u- 1/2). 

This defines a probability density on [0, 1] with distribution function 

G c {u) :=u — cu(l — u)/2. 

Proposition 4.1. Define U = {Uu\)^q as in Proposition 2.1. For ar- 
bitrary integers 0<j<k<n+l with k — j > 1, it follows from inf j /' > 
that 

T jk (K) > (3{G- s \U {i . m )) wzth S - 



i=j+l y F(Ijk) 

Moreover, for any fixed c G [—2, 2] and U ~ Unif [0, 1], 

E(3(G- 1 (U)) = c/6, Var(/3(G- 1 (C7))) < 1/3, 

Eexp(t/3(G~ 1 (17))) < exp(ct/6 + i 2 /6) /or a// t G E. 

5. Monotonicity of the failure rate. To investigate local monotonicity 
properties of the failure rate //(l — F), such as the presence of a 'burn-in' 
period or a 'wear-out' period, we consider 

i n+1 

Wi-.= J2 D k/j2 D k> * = 0,...,n + l, 

fe=i fc=i 
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where Di := (n — i + 2){X^ — X^_i^), i = 1, . . . ,n + 1, are the normalized 
spacings. Here, X( ) < < • • • < -^(n+i) are the order statistics of n + 2 or 
n + 1 i.i.d. observations from i 7 , in the latter case with Xm\ being the left 
endpoint of the support of F. The next proposition shows that the problem 
can now be addressed by applying the methodology of Section 2 to the 
transformed data vector W = (Wi)^ 1 . 

Proposition 5.1. Let XL :=-log(l — F(X^)), i = 0, ...,n+l, and 

define W = (Wj')^j" analogously to above, with X' in place of X. T/ien, 
W =£ U and, /or arbitrary integers 0<j<k<n+l with k — j > 1, 

, , f > Tjfc(W'), «/ i/te failure rate of f is nondecreasing on Xjk, 
\ < Tjk(W), if the failure rate of f is nonincreasing on Tj^. 

6. Graphical displays and examples. We first illustrate our methodology 
with a sample of size m = 300 from the mixture distribution 

F = 0.3 ■ Gamma(2) + 0.2 • M(5, 0.1) + 0.5 • M(ll, 9), 

where Gamma(2) denotes the gamma distribution with density g(x) = xe~ x 
on (0, oo). Figure 1 depicts the density / of F. 

Figure 2 provides a line plot of the data and a visual display of the mul- 
tiscale analysis. The horizontal line segments above the line plot depict all 




Fig. 1. Density of 0.3- Gamma(2) + 0.2- Af(5, 0.1) + 0.5- Af(ll, 9). 
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2 4 6 6 10 12 14 16 18 

Fig. 2. Minimal intervals in T> (0.1) (top) andT>~(0.1) ('bottom,). 



minimal intervals in P + (0.1), while those below the line plot depict all min- 
imal intervals in T>~(0.1). Here, we estimated the quantile « m _2(0.1) to be 
1.518 in 9999 Monte Carlo simulations, where we restricted (j,k) in the 
definition of T to index pairs (j, k) such that (k — j)/(m + 1) < 0.34. For 
example, we can conclude with simultaneous confidence 90% that each of 
the intervals (0.506,3.887) and (5.022,5.841) contains a decrease and that 
each of the intervals (3.983,4.882) and (5.841,10.307) contains an increase. 
As these four intervals are disjoint, we can conclude with confidence 90% 
that the density has at least three modes. 

A referee reports that the taut string method of Davies and Kovac (2004) 
found three modes in about 82% of cases. Our method finds three modes in 
about 39% and exactly two modes in about 50% of the cases. However, the 
latter method also allows the localization of the modes. Figure 3 provides 
a diagnostic tool for this type of inference. Each horizontal line segment, 
labeled by "+" or "— ", depicts an interval in some T> + (a) [resp., D~(a)]. In 
each row, the depicted intervals are disjoint, with an alternating sequence of 
signs. The number in the first column gives the smallest significance level at 
which this sequence of alternating signs occurs, and the plot shows all such 
sequences that have a significance level of 10% or less. The intervals depicted 
in a given row are chosen to have the smallest right endpoint among the 
minimal intervals at the stated level. Consecutive intervals are plotted with 
a small vertical offset to more readily show their endpoints. For example, 
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Fig. 3. Alternating sequences 


of minimal intervals in T> + 


(a), and T>~ (a) 


with the cor- 


responding p-values a. 










Table 1 






Proportion of rejections of th 


e null hypothesis at the 5% 


significance leve* 


! in 10,000 




simulations 






ai 


-0.2 -0.1 





0.01 


(3 = 


0.014 0.026 


0.049 


0.052 


/3 = 0.3, cr = 0.2 


0.066 0.115 


0.215 


0.224 


13 = 0.3, o- = 0.1 


0.188 0.301 


0.439 


0.451 



Figure 3 implies a p-value of less than 1% for the existence of at least two 
modes, and a p-value of 7.33% for the existence of at least three modes. 

Our second example concerns the detection of an increase in a failure 
rate. Gijbels and Heckman (2004) compare a global test and four versions 
of a localized test in a simulation study. A sample of size m = 50 is drawn 
from a distribution whose hazard rate h(t) is modeled via log /i(i) = a\ logt + 
/3(27rc7 2 )- 1 / 2 exp{-(t-^) 2 /(2o- 2 )}. Table 1 shows the power of our procedure 
from Section 5 for the choices of parameters ai,(5,a used by Gijbels and 
Heckman (2004). The cases with j3 = 0, a\ < pertain to the null hypothesis 
of a nonincreasing failure rate, whereas = 0, a± = 0.01 implies an increasing 
failure rate. The other eight cases result in a failure rate with a local increase. 
The power of the test introduced in Section 5 exceeds those of the five tests 
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examined by Gijbels and Heckman (2004) in four of the nine cases that 
involve an increase in the failure rate. 

7. Proofs. 

7.1. Proofs of Propositions 2.1, 4.1 and 5.1. The proofs rely on the fol- 
lowing elementary inequality, which we state without proof. 

Lemma 7.1. Let G Q and G be distribution functions on an interval (a,b) 
with densities g D and g, respectively. Suppose that g — g Q < on (a,c) and 
g — <7o > on (c, b) , where a < c<b. Then, G~ 1 > G^ 1 . 

Note that the conditions in Lemma 7.1 are satisfied if, for instance, g a 
and g are differentiable with derivatives satisfying g' > g' . 

Proof of Proposition 2.1. It is well known that fTm , . . . , Ur n ) are 
distributed as the order statistics of n independent random variables having 
uniform distribution on [0, 1]. Suppose that /, and thus fjk is nondecreasing 
on ljk, where k — j > 1. The assumptions of Lemma 7.1 are then satisfied, 
with g = fjk and g (%) '■= l{a? £ 2j k }/\lj k \. This implies that for j <i <k, 

x (i) = G , ~ 1 (J7( i;J - jfc )) > G~ (£7(i;j,fc)) = X(j) + (X (jfc ) - X(j))U^ k ), 

whence I}fc(X) > Tj/%(U). In the case where / is nonincreasing on 2j/%, the 
reverse inequality Tjk(X-) < Tjfc(U) follows from Lemma 7.1 with g(x) = 
l{x £ l jk }/\Ij k \ and g Q := f jk . □ 

Proof of Proposition 4.1. Again, we apply Lemma 7.1, this time 
with the densities 

g(u) := \l jk \f jk (X^) + \T jk \u) 
and g -=gs on (0, 1). Note that 

inf g' = \l jk \ 2 inf f\ h = S = g' q . 

(0,1) ^ ' J/Cl T jk J3k yS 

It thus follows from Lemma 7.1 that 

k-l k-l 
T jk (X)= £ (3(G-HU {lUik) ))> P{G" s l {U (i ^ k) )). 

i=j+l i=j+l 
As for the moments of ^(G" 1 (U)), first note that, generally, 

EhiPiG" 1 (U))) = f 1 h(P(u))(l + c(u - 1/2)) du 
Jo 
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for h : [— 1, 1] — > K. Letting h(v) := v 3 with j = 1, 2 shows that the first and 
second moments of /3(G~ 1 (C/)) are given by c/6 and 1/3, respectively. More- 
over, letting h(v) := exp(tu) yields 



M c (t) := logEexp(t/3(G^ 1 (C/))) - ct/6 = log(A(t) + cB(t)) - ci/6, 
where 

fc=0 



1 <•! "~ Wit 

B(i) := - / e tv vdv = (cosh(t)/i - sinh(i)/£ 2 )/2 



t ^ 3 t 2k 
~ 6^2^ + 3(2^ + 1)!" 

We have to show that M c {t) < t 2 /6 for any t ^ 0. To this end, note that 
dM c (t)/dc equals B(t)/(A(t) + cfl(t)) - t/6 and d 2 M c (t)/dc 2 < 0. Thus, 
M c (i) is strictly concave in c G {c : A(i) + dB(t) > 0}. The equation dM c (t)/dc - 
is equivalent to .A(t) + cB(t) being equal to 6B[t)/t > and this means 
that ct/6 = 1 — tA(t) j (6B(t)). Hence, elementary manipulations of the series 
expansions yield 

M c (t)< log — ^ + 



log E 

U=0 



65(f) 
3 t 2fc 



2k + 3 (2k + 1)! 



t 2 ~ 5-3 ^L_/^ 3 



15 ^ (2A; + 5)(2/c + 3) (2fc + l)!/ ^ 2k + 3 (2fc + 1)! 
~ 6 ' 



□ 



Proof of Proposition 5.1. By construction, the vector (XL — XL-^^i 
is distributed as the vector of order statistics of n + 1 independent random 
variables with standard exponential distribution. Well-known facts imply 
that the variables D\ are independent with standard exponential distribu- 
tion. Hence, (W{, ...,W£= C (U {1) , . . . , U (n) ), while = and W^ +1 = 1. 

We now assume that the failure rate is nondecreasing on Xjk\ the nonin- 
creasing case is treated analogously. The function G(x) := — log(l — F(x)) is 
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then convex onljk- Hence a s := D' s /D s is nondecreasing in s £ {j + 1, . . . ,k}. 
Consequently, for j <i < k, 

xx/ xx/l _ ^s=j+l D s Es=j+1 a sDs 

W (i;j,k) ~ W (i;j,k) ~ 7T ~ 

l^s=j+l u s 2^s=j+l a s- u s 

_El= j+ iEt= i+ i(at-a s )D s D t 



Hence, T jk (W) >T jk (W). □ 



s=j+l Us lut=j+\ a t^t 

>0. 



7.2. An auxiliary result concerning stochastic processes. Our proof of 
Theorem 3.1 builds on a new version of Theorem 6.1 of Diimbgen and 
Spokoiny (2001). An important difference is that the original requirement of 
sub-Gaussian increments is relaxed to subexponential increments. The new 
version itself is just a corollary to more general results concerning stochastic 
processes obtained by Diimbgen and Walther (2006). We consider a stochas- 
tic process Z = [Z{t)) t ^T with continuous sample paths on a totally bounded 
metric space (T,p), where p < 1. "Totally bounded" means that for any 
u > 0, the capacity number 

D(u, T, p) := sup{#T Q : T Q C Tsuch thatp(s, t) > v for different s, t £ T } 

is finite. In addition, we consider a function a:T — > (0,1], where a(t) may 
be viewed as a measure of spread of the distribution of Z(t). We assume 
that 

(7.1) \o-(s)-a(t)\<p(s,t) for all s,t£T 

and that {t E T : <r(i) > 5} is compact for any 5 E (0, 1]. 

Theorem 7.1. Suppose that the following three conditions are satisfied: 

(i) there exist constants A, B, V > such that for arbitrary u,5 E (0, 1] , 

£>(u<5, {teT: a{t) <5},p)< A U - B 5~ V ; 

(ii) there exists a constant K > 1 swc/i £/icrf /or arbitrary s,t E T and 
r/>0, 

P(|Z(s) - Z(t)| > Kp(s,t)ri) < Kexp(-7?); 

(iii) /or arbitrary t£T and r/>0, 

F(\Z(t)\>a(t)7 1 )<2exp(-7 1 2 /2). 
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Then, 

/ \Z(s)-Z(t)\ \ 

\ s ,i G rp(«,*)log(e/p(s,t)) / 

p/ |Z(t)|/(7(t)- V2yiog(lAr(t)) . \ . 

p te v 

with D(5) := \og(e/5)~ l l 2 log(elog(e/<5)), where p\ = p\(j\A, B,K) and P2 = 
P2(-\A,B,V,K) are universal functions such that ]im„^ 00 pj (77) = 0. 

7.3. Proof of Theorem 3.1. In what follows, we describe a proof, but 
omit some technical arguments and details; for a complete account, we refer 
to Diimbgen and Walther (2006). 

We embed our test statistics Tj k into a stochastic process Z n on 

%i ■= {(T jn ,T kn ):0<j <k<n+l}, 
where Tj n := i/(n + 1), equipped with the distance 

p((u,v), (u',v')) := (\u - u'\ + \v — v'l) 1 / 2 
on T := {(u, v) : < u < v < 1}. Namely, let 

Z n (r jn ,T kn ) :=3 1 / 2 (n+l)- 1 / 2 T jk (V). 
Moreover, for (u,v) £T\T n , let 

I (n + l)cj 

Z n (u,v) := Z n (T n (u),T n (v)) with r n (c) := — — . 

n + 1 

Note that 

E(Z n (ii,7j)) = and Var(Z n (ii, ?j )) < cr(u, t>) 2 , 

where a(u,v) := — u) 1 / 2 . Elementary calculations show that these func- 
tions p and a satisfy (7.1). Later, we shall prove the two following results 
concerning these processes Z n and the limiting process Z defined in Theo- 
rem 3.1. 

Lemma 7.2. The processes Z onT and Z n onT n (n £~N) satisfy condi- 
tions (i)— (Hi) of Theorem 7.1 with A = 12, B = 4, V = 2 and some universal 
constant K . 

Lemma 7.3. For any finite subsetT a ofT, the random variable (Z n (t))teT 
converges in distribution to (Z(t))teT a - 
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With arguments similar to those in Diimbgen (2002), one can deduce from 
Theorem 7.1 and Lemmas 7.2-7.3 that the preliminary test statistic 

f n := max ( 3>V*(k - j)-^ 2 T jk (\J) - T ( 



0<j<k<n+l\ J \n+l 



taT n V ait) K v ' 7 



a(t) 

[with Tjfc(U) := if k — j = 1] converges in distribution to T(W). Moreover, 



with 



T„(U)=max(^l-r( ( x( i ) 2 )) 



a n {t):={a{tf -{n+l)- 1 ) 1 ' 2 , 



where we use the convention that 0/0 := 0. By means of the inequality 
|Z n (i)| < (n + l) 1 / 2 <T n (t) 2 and elementary considerations, one can show that 
T n (U) =f n + op(l), whence T n (U) -+ L T{W). 

Proof of Lemma 7.2. A proof of condition (i) is given by Diimbgen 
and Spokoiny (2001) (proof of Theorem 2.1) in a slightly different setting. 

Next, we verify condition (ii). In order to bound the increment Z n (s) — 
Z n (t) in terms of p(s,t), we first consider the special case of s = (0,1) and 
t = (r, 1), where r = r kn for some k G {1,. . . , n}. Note that 

n k—1 n 

£(2tf (i) -l) = £(2tf (i) -l)+2tf (fc) -l + ( 2 U(r)-l), 

2 = 1 2 = 1 2 = £; + l 

k—1 k—1 jj 

£(2tf (i) - 1) = J2 hj^ 1 - - 1) u (k) + (k- l)U (k) , 
i=l i=\ ^ u ( fc ) ' 

n n 

£ (2U {i) - 1) = E TO*) " - 1) + 2(n - fc)^ (fc ) 

t=fe+l i=fe+l 

17m - 17, 



E ( 2 T_ „ W -l)(l-t7 W ) + (n-fc)17 W , 



=fc+ i v 1 " u (k) 
whence 

Z n (0, 1) = Z n (0,r)*7 (jfc) + Z„(r, 1)(1 - I7 (fc) ) + S 1 / 2 ^ + l) 1/2 (f/ (fc) - r). 
Consequently, 

Z n (0,l)-Z n (r,l) 

= (Z n (0,r) - Z n (r, l))U {k) + 3 1 / 2 (n + l) 1/2 (U {k) - r) 
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fk-l 



3 l/2 (n+ir l/2 IJ2/3 



w 



u, 



(k) 



i=k+l 



U(i) - U(k) 



(k) 



+ ^\n+lfl\U {k) -r) 
-c 3 l l\n + 1)-V2 g ^ ( ^)[/ (fe) + 3 i/ a(n + i)V2 (C/(fc) _ r) 



8=1 

where C/i, . . . , ?7 n , [/{,..., are independent and identically distributed. 
Note that Uha has a beta distribution with parameters k and n + l — k. This 
entails that 

P{±(J7(fc) - r) > c} < exp(-(n + l)*(r ± c, r)) for all c > 0, 

where ^(x,r) := rlog(r/x) + (1 — r) log((l — r)/(l — a;)) if x G (0, 1), and 
\&(x,r) := oo otherwise; see Proposition 2.1 of Diimbgen (1998). Elementary 
calculations show that \&(r ± c, r) is not smaller than c 2 /(2r(l — r) + 2c), 
whence 

, V r / V / (?1+1)C 2 

(7.2) n±(%) _ T )> c} < exp (__l_i_ 

for all c > 0. Consequently, for any r > 0, 

IP{|3 1/2 (^ + l) 1/2 (t/ {fc) - r)| > ^((0, 1), (r, 1))} 
= P{|3 1 / 2 ( n +l) 1 / 2 (C/ (fc) -r)|>rr 1 /2 } 

.1/2 

n\u {k) -r\> 



rr 



3 1 /2( n+ i)i/2 



(7.3) 



< 2exp 



< 2exp 



r 2 r 



6r(l - r) + 12V2 r („ + i)-i/2 r i/2 

2 



< 4exp(-r/4). 

Here, we used the fact that in + l)r > 1. Moreover, for any r > 1, 
pj 



n-l 



3^^+ 1)^/2^^)^ 
i=l 

(3/«) 1/2 E/3(^) 



< 



> rr 1 / 2 



i=i 



> r 



1/2 



+ P{C/ (fe) >r 1 /V/ 2 } 
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< 2exp(-r/2) + F{U {k) - r > r 1/2 r 1/2 - r} 

( n+1 )( r l/2_ 1 )5 



< 2exp(-r/2) + exp 

< 2exp(— r/2) + exp 

< 2exp(-r/2) + exp 



2r(l-r) + 2(r 1 /2_i) T i/2 
( n + l)( r i/2_ 1 )2 r i/2 

2 + 2(rV2_i) 
(n + 1) i/2 (r i/2_ 1)2 
2rV2 



Note that the probability in question is zero if r is greater than 3 1 / 2 (n + 
l)~ 1 / 2 (n — l)r~ 1 / 2 and the latter number is smaller than 3 1//2 n. Thus, sup- 
pose that r < 3 l l 2 n. Then, 



(„ + l)l/2( r l/2 _ 1)2 (3 -l/2 r + ^1/2^1/2 _ 1)S 



> 



2r i/2 - 2rV2 

Consequently, for all r > and some positive constant C\ 

{n-l 
i=l 



>3 _1 (r 



Vr.V2 



(7.4) 



exp(-r/Ci). 



Combining (7.4) and (7.4) yields 

(7.5) P{|Z n (0, 1) - Z n (r kn , 1)| > rp((0, 1), (r fen , 1))} < C 2 exp(-r/C 2 ) 

for some positive constant C2- Symmetry considerations show that the same 
bound applies to s = (0, 1) and t = (0,r), that is, 

(7.6) F{\Z n (0, 1) - Z n (0, t)\ > rp((T kn , 1), (0, 1))} < C 2 exp(-r/C 2 ). 

In order to treat the general case, note that the processes Z n rescale as 
follows. For < J < K < n + 1, 

(Z n (Tj+j,n,Tj + k )n ))o<j<k<K-J 

=C &(T~Jn,TKn){ZK-j(Tj,K-J,T~k,K~j))o<j<k<K-J, 

while for < j < k < K - J and < f < k' < K - J, 

+j,n, Tj+k,n ),( T J+j'n,Tj+k',n)) 

= 0"(t Jn,TKn)p((T j,K-J,T~k,K-j), (t j' ,K- J , Tk> ,K- j)) ■ 

With this rescaling, one can easily verify condition (ii) with K = 2C 2 . 

Finally, according to Proposition 4.1, Eexp(r/3(i7j)) < exp(r 2 /6) for all 
r£l, whence 

Eexp^a^)- 1 ^^)) < exp(r /2) for rGl,tG T n . 
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A standard argument involving Markov's inequality then yields condition (in), 
□ 

Proof of Lemma 7.3. Recall the representation Uu\ — Un_i\ = Ei/S n 
with independent, standard exponential variables E{ and S n = Y^j=i Ej- 
Starting from (2.1), one can write 



Z n (r jn ,r kn ) = -3V2 » / (n + 1)"^ £ £ / . jEi 



T"kn Tjn ^1 + 1 



where 

n+l 



X Z n (Tj n ,T kn ), 



Z n (r jn , r kn ) := 3^(n + 1)-V £ fff 7 *" T * °" (1 - K 



and S n := (2(n + 1)) . The centering of the variables E{ is possible because 
the sum of the coefficients (3((i — j — 1/2) /{k — j)), j <i<k, is zero. Since 
S n /{n + 1) - >p 1 and maxi<j< n \Uu\ — Tj n | — > p 0, it suffices to consider the 
stochastic process Z n in place of Z n . But the assertion then follows from the 
multivariate version of Lindeberg's central limit theorem and elementary 
covariance calculations. □ 

7.4. Proofs for Section 4. We first prove the lower bounds comprising 
Theorem 4.1 (b)-(c). The following lemma is a surrogate for Lemma 6.2 of 
Diimbgen and Spokoiny (2001) in order to treat likelihood ratios and i.i.d. 
data. 

Lemma 7.4. Let X\, X2, ■ ■ ■ , X n be i.i.d. with distribution P on some 
measurable space X. Let fi,...,f m be probability densities with respect to 
P such that the sets Bj := {fj 7^ 1} are pairwise disjoint and define Lj := 



E 



m-^Lj-l 



0. 



provided that m — > 00, 

Aoo <C{\ogm)~ 1 / 2 for some fixed constant C and 

A ft nA 2 \ 

Vlogm 1 - — -> 00, 



2 log to, 

where A M := max.,- sup,,, \ fj(x) — 1| and A2 := maxj(J (fj — l) 2 dP) 1 / 2 . 
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Proof of Lemma 7.4. The likelihood ratio statistics Lj are not stochas- 
tically independent, but conditional on v = (^)yli with Uj := #{i : Xj € 
-B?}, they are. Furthermore, = 1 =E(LJf). Thus, a standard trunca- 

tion argument shows that for any e > and < 7 < 1, 



E 



m 



<m _1 Var^l{Lj <em}Lj v\ + 2m" 1 ^ E(l{L j > em}Lj 

V j / j 

<m^ 1 ^E(l{L i <em}L^|i/)^ +2m^ 1 ^E(l{L j > em}L.j\u) 
\ j ' j 

< mT x ^E(emL» j + 2 e - 7 m- {1+7) ^E(L 1+7 |i/) 



e V2 + 2e" 7 m- 



L+7) ^E(L] +7 
j 

It thus suffices to show that 



v). 



inf maxm 7 E(L 

76(0,1] 3 



j > 



under the stated conditions on m, A^ and A2. Note that E(L 1+7 ) is equal 
to E(/j(Xi) 1+7 ) n and that elementary calculus gives 

(l + y) 1+1 < l + (l + i)y + 7(1 + 7)y 2 /2 + 3 7 |y| 3 for \y\ < 1. 
Hence, E(/ j (Xi) 1+ T) < 1 + 7 (1 + 7 )A|/2 + 37AooA^ and 

maxm- 7 E(L^ +7 ) 

(7.7) <m- 7 (l + 7(l + 7)A2/2 + 37A 00 A2)™ 

< exp(-7logm + 7(1 + 7)nA§/2 + 3-fA^nAl). 

Suppose that nA\ < 2(1 — b m ) logm, where (0, 1) 3 b m — > and b 2 m logm — > 
00 as ra-> 00. Then, the right-hand side of (7.7) does not exceed 



exp(-7(l - (1 +7)(1 -6 m ))logm + 67A 00 logm) 



< exp 
^0 



6^ log m 
~4(l-6 m ) 
as m-> 00. 



+ 3Cb m (logm 



,1/2 



if 7 



2(1 - b m ) 



□ 
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Proof of Theorem 4.1(b). Let c n := c n y/log(e/5 n )/n, and set f := 



l[o,i) an d 



fnj(x) := f (x) + l{x e I nj }cJ- 3 /\x - (j - l/2)6 n ) 



for j = 1, .. . ,m n := [l/S n \ and I nj := [(j - l)S n ,jS n ). Each f nj is a prob- 
ability density with respect to the uniform distribution on [0, 1) such that 
the corresponding distribution F n j satisfies F n j{I n j) = 5 n and inf/ nj f n j ■ 

\I nj \ 2 /jF nj (I nj ) = c n , that is, f nj 6 J~(8 n , c n ). Thus, for any test n (X) 



with E f J n (X)<a + o(l), 



M E^(X)-a<m^E /Bj «X)-a 



E 



/o 



<E 



/o 



Din 



where L n j := IliLi fnj(Xi). The latter expectation tends to zero by Lemma 7.4. 
For A 2 , = c 2 /12, and Aoo = c n 6n l l 2 /2 is less than yj Q\og{e / 5 n ) / (n6 n ) = 
0(log(n) -1 / 2 ) = 0(log(m n ) -1 / 2 ) because n5 n > log(n) 2 , hence m n = 5" 1 + 
0(l) = o(n). Finally, 



nA 2 , 

Vlogm„ 1 



2 log m r . 



241ogm n -cj\og(e/5 n ) 
24Vlogm n 



> V24(V24 - c n )yiog(e/<5 n )(l + o(l)) + o(l) 
tends to infinity by assumptions on <5 n and c n . □ 



Proof of Theorem 4.1(c). We may assume without loss of generality 
that the left endpoint of I n is 0. We now define probability densities /„ and 
9n via 

fn(x) :=^-l{xe[0,\I n \/S n }}, 



9n(x) := f n (x) + 



V^\In\ 2 



(x-\I n \/2)l{x€l n }. 



Note that g n > because b n < 2\/n5 n . Furthermore, f n is nonincreasing on 
/„, while g n belongs to F(I n ,6 n ,b n /y/n). 
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We now apply LeCam's notion of contiguity [cf. LeCam and Yang (1990), 
Chapter 3]: If a test <f> n 0Q satisfies Ej n n (X) < a, then limsupE 9n ^> n (X) < 
1, provided that 



for some probability measure Q on the real line such that J e x Q(dx) = 1. 

Note that £/ n (E?=i l°g(W/n)PQ)) equals the distribution of log(l + 
CnVi) with c n := b n /(2y/n5 n ) G [0, 1] and independent random variables N n , 
Vu V 2 , V 3 , . . . such that N n ~ Bin(ra, S n ) and V* ~ Unif[-1, 1]. 

First, suppose that n6 n -f* oo. By extracting a subsequence, if necessary, 
we may assume that nS n — > A G [0, oo) and c n — > c G [0, 1]. (7.8) then holds 
for the distribution Q := £feloPA(/e)£(ELi lo g(l + cVJ)) with the Pois- 
son weights p\(k) := e~ x \ k /k\. But this measure Q satisfies / e x Q(dx) = 1, 
whence limsupE 9n (/> n (X) < 1. This contradiction shows that n5 n — > oo. 

Second, suppose that n5 n — ► oo, but 6 n oo. We assume without loss of 
generality that b n — > 6 G [0,oo). Lindeberg's central limit theorem and ele- 
mentary calculations yield (7.8) with Gaussian distribution Q = M(— 6 2 /24, 
6 2 /12). Again, the limit distribution satisfies / e x Q(dx) = 1. Hence, 6 n — ► oo. 
□ 

Theorem 4.2 concerns our specific multiscale procedure. It will be derived 
from the following basic result. 

Lemma 7.5. For a bounded open interval I and 5g (0,1] let f be a 
density in J- (1, 5, D y/\og(e / 8) / n) with D > \/24. Then, 

n5 > £>max(log(e/(5),i^ log(en)), 

i/rei/i _D := Z) 2 /4 and If > 1 — (log-D + loglog(en))/log(en). Suppose that 



Proof. The inequalities 2y[5 > H(f, I) > D^log(e/5)/n entail that n5 > 
D\og{e/8). Now, write n5 = DKlog(en) for some K > 0. In the case of 
K<1, 

DK\og{en) >D\og{e/5) = £>(log(en) - log(I)if log(en))) 



(7.8) 




(7.9) 




for certain numbers e G (0, 1), 7 G (0, 1/2] and 77 > 0. Then 
¥(T> + (a) contains no interval J C I) 
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and dividing both sides by Dlog(en) yields the asserted lower bound for K. 

The number N := #{i:X^ G 1} has distribution Bin(m,<5) with m G 
{n,n + l,n + 2}. Consequently it follows from Chernov's exponential in- 
equality for binomial distributions [cf. van der Vaart and Wellner (1996), 
A.6.1] that 

F(N < (1 - 7)116) < exp(-n57 2 /2). 

Since D > \/24 by assumption, we can conclude that n5 > D 2 /4 > 6, so 
(1 — 7)n5 > 3. In the case of N > 3, let j := mm{i:X^ G /} and k : = 
max{i:X(j) G /}, that is, iV = k — j + 1. In order to bound the probabil- 
ity of |2jfc|/|/| < 1 — e, we write I = (a,b) and define Im := (a, a + e|J|/2], 
I {r) := [6 — e|Z|/2, &). Then, 

nF(I {r) ) > nF(I {e) ) > rcinf/' • \I (e) \ 2 /2 = nH(f, I)VSe 2 /8 
> D^Jn5log(e/5)e 2 /8, 

whence 

P(JV<1 or \ljk\/\I\ <l-e) 

< P(no observations in J/«) +P(no observations in J/ r ^) 

< 2exp(-D v /n ( 51og(e/5)e 2 /8). 

Hereafter, we will always assume that N > (1 — 7)n<5 and 12^1/1/1 > 1 — e. By 
P*(-), we denote conditional probabilities given these two inequalities. The 
definition of T> + (a) implies that P*(X? + (a) contains no J C I) is not greater 
than P*(Tjfc(X) < Cjk(a)). On the other hand, it follows from Proposition 4.1 
that 

P* (r ifc (X) < d{N ~ 2) - V^j^j < exp(-r/ 2 /2) for any rj > 0, 

where (5 := H(f,Tjk)/\J F(Ijk). It thus suffices to show that 

C(N-2) IN -2 . . 
6 ^^->^(«). 

By the definition of c^a) this is equivalent to 
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However, the left-hand side is not smaller than 



(1 - efH (/, I) jN - 2 (1 - e) 2 #(/, 7)^(1-7)^-2 



v 7 ^ V 12 ~ Vl25 



> D (lz^VO r(5) 



>r(5) + K „(a) + r ? + 



r(5) 

with 7 := 7 + 2/(nS), whereas F((N - l)/(n + 1)) < T((iV - 2)/n) is not 
greater than 

r(5(i - 7)) < r(5) - io g (i - 7 )/r(5) < r(<j) + 7 /r(5). □ 

Proof of Theorem 4.2. First note that (4.2) and the first part of 
Lemma 7.5 entail that 

n5 n >C 2 /A>6 and nS n > {C 2 /A + o(l)) logn. 

In particular, #l n < 1 = o(n). 

We apply Lemma 7.5 to / = f n and all intervals I G X n . More precisely, 
we shall introduce suitable numbers 7„, G (0, 1/2], e n G (0, 1) and 7/ nj r > 0. 
According to Lemma 7.5, the probability that some I G X n does not cover 
an interval from T> + (a) is bounded by 



#T n (exp(-n5 n7 2 /2) + 2exp(-C Jn5 n log(e/8 n )e 2 n /8)) 

(7.10) 

+ E ex P("<//2), 

provided that 

A r(Fn(/))/ " (i-e„) 2 v / r^V r(F n (/)) r(i? n (j))5 

for all I G X n , where j n := 7 n + 2/ (nJ„) = O(l). Also note that n n (a) = 0(1), 
by virtue of Theorem 3.1. Hence, the preceding requirement is met if, for 
every constant A > and sufficiently large n, 

(7.11) cfl + T A)> , ^, . (l + A + 



r(F„(/));- (i-e„) 2 vr^KV r(F„(/)), 

for all I G I n . 

In setting (i), we use constants j n = 7 G (0, 1/2] and e n = e G (0, 1) to be 
specified later and define 



rj n ,i := J21og(l/ J F„(/)) + 6„ < r(F„(/)) + \/b n . 
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Since 5\og{e/S) is nondecreasing in 5 G (0, 1], it follows from n5 n > (C 2 /4 + 
o(l)) logro that 



/ ^ n log(e/(5 n )>(C/2 + (l))logn. 

Hence, the bound in (7.10) equals 

o(l) • (exp(-(CV/8 - 1 + o(l)) logn) 

+ exp(-(C 2 e 2 /16-l + o(l))logn)) + £ F n (/) exp(-6 n /2) 

and tends to zero, provided that 7 > V&/C and e > 4/C. Moreover, the 
right-hand side of (7.11) is not greater than 

\/24 ^ A + 



( i_ e) 2 v / 1 _ 7 _ o(1) v r(F n (/)) 

2^24 + o(l) f i+ o{b n ) 



(i- e 2 )yr^v r(F„(j)) 

Hence, the conclusion for setting (i) is correct if, say, e = 4/ (2y24) = 1/I/6 
and 7 = \/8/(2y / 24) = ^1/12, while C is strictly larger than 



2V24 

<34. 



(l-e) 2 v^ 
In settings (ii)-(iii), we define 

7 n :=(21og(Z)#J„)/(n5„)) 1/2 , 
e n := ((8/C)\og{D#I n )/Jn5 n log(e/5 n )) 1/2 , 



21og(l/F n (/)) + b n , in setting (ii), 
b n /D, in setting (in), 



for some (large) constant D > 1. The bound in (7.10) is then not greater 
than 

3/D + ( n ^-H 2 !;,^, in Se * thlg £S } = 3/D + o(l). 
' \exp(log#J n - 6 2 /(2L> 2 )), m setting (111) J ' w 

It thus remains to verify (7.11). 

Note that 7 n — > 0, by assumption. Moreover, since #2" n < S~ , the term 
log(D#J„) is not greater than log^/^n) 1 / 2 \og{D#l n ) l l 2 , whence 



e« < v /87c(log J D) 1 / 4 (log( J D#J n )/(n ( 5 n )) 1/4 0. 
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Hence, in setting (ii), the right-hand side of (7.11) is not greater than 

so (7.11) is satisfied for sufficiently large re, if C > 2\/24. In setting (iii), the 
right-hand side of (7.11) is not greater than 

24( 1 + 0fi „ + e „) + (1 + 0(1)) ±t^). 

By the first part of Lemma 7.5, nb n > (C 2 /4) log(e/<5 n ) > (C 2 /8)r(F n (/)) 2 
for all /El n . Thus, 

„ ^ 0(log(£>#T ra )V 2 ) (b n ) 



Consequently, (7.11) is satisfied if C > v24. □ 
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